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1.   INTRODUCTION 

Perhaps  the  major  complication  associated  with  digital  divi- 
sion is  "best  illustrated  by  your  performing  the  following  long-division 
problem  and  noting  carefully  the  steps  you  follow. 


396A 


1057         6        2         1 

A1A2A3 


A  =  decimal  point  marker 


Your  operations  in  selecting  the  first  quotient  digit  are 
summarized  in  the  flow  chart;  Figure  1.   The  salient  point  is  that 
division  is  a  trial  and  error  process  requiring  an  initial  "guess"  of 
a  quotient  digit  followed  by  a  subtraction,  or  at  least  a  comparison,  to 
determine  whether  the  guess  is  correct.   If  it  is  not,  the  initial 
choice  is  modified  and  the  process  repeated.   It  is  the  trial  and  error 
nature  of  division,  whether  performed  by  man  or  machine,  which  complicates 
its  execution.   In  building  a  computer  arithmetic  unit,  division  is  the 
most  difficult  basic  operation  to  implement  efficiently. 

But  despite  the  complexity,  the  literature  is  replete  with 
themes  and  variations  for  implementing  digital  division.   Flores, 
for  example,  states  four  methods  for  increasing  speed  of  division  and 

then  proceeds  to  describe  no  less  than  twenty-four  schemes  which  in- 

[21 

corporate  some  or  all  of  these  speed-up  techniques.   MacSorley 

describes  four  division  techniques  demanding  various  divisor  multiples 
to  accelerate  execution. 


* 

Numbers  in  brackets  refer  to  the  corresponding  entry  under  References 


j-l 


j    -  INDEX 

d   =  DIVISOR 

Pj=  PARTIAL      REMAINDER 

P0=  DIVIDEND 

qj=  QUOTIENT      DIGIT 


FIGURE  I.      FLOWCHART    OF    MANUAL     EXECUTION    OF     DIVISION 


There  is  far  less  in  the  literature,  however,  describing 
theory  and  analytic  tools  to  be  used  in  designing  a  division  scheme. 
Most  of  the  articles  describe  schemes  which  are  products  more  of  art 
than  of  science,,   This  report  is  an  attempt  to  contribute  to  the 
science  of  computer  arithmetic  implementation. 

This  report  describes  a  class  of  division  techniques  especially 
suited  for  implementation  in  an  electronic  digital  computer.   For 
historic  reasons,  this  class  will  be  referred  to  as  SRT  division.   The 
name  is  derived  from  the  fact  that  the  binary  case  of  this  type  of 
division  was  discovered  independently,  at  about  the  same  time,  by 
Dura  Sweeney  of  IBM,  J.  E.  Robertson  of  the  University  of  Illinois, 

r  3"! 

and  T.  D.  Tocher  of  Imperial  College,  London    .   The  paper,  however, 
incorporates  more  recent  work,  due  exclusively  to  Professor  Robertson, 

which  extends  the  binary  SRT  division  to  a  radix  higher  than  two. 

["51 
Much  of  Chapter  2  is  based  upon  his  report  L    and  upon  numerous 

personal  communications  - 

After  a  description  of  the  theory  and  properties  of  SRT 

division,  the  report  turns  to  the  problem  of  actually  implementing 

the  scheme  and  presents  an  example  of  one  possible  realization. 


2.   THE  THEORY  OF  SRT  DIVISION 

2 ,0  Introduction 

This  chapter  introduces  a  recursive  relationship  for  de- 
scribing division  and  from  it  develops  the  nature  of  SRT  division. 
The  discussion  is  augmented  with  two  graphical  representations;  one 
to  determine  the  range  restrictions  associated  with  SRT,  and  the  other 
to  aid  in  computing  the  "cost",  of  quotient  digit  selection* 

Most  of  the  following  analysis  will  "be  developed  for  a 
general  radix,  r.  At  first  this  generality  may  appear  superfluous,  for 
after  all,  isn't  a  digital  computer  a  binary  machine,  and  doesn't  binary 
imply  radix  two?   It  is  true  that  the  basic  storage  elements  of  a 
digital  computer  are  two  state  devices  and  that  numbers  are  represented 
internally  by  strings  of  "l's"  and  "0's".   Computer  arithmetic,  however, 
is  often  facilitated  by  considering  groups  of  bits  rather  than  each  bit 
individually.   Such  grouping  may  be  interpreted  as  use  of  digits  of 
higher  radix  than  two.   For  example,  a  pair  of  bits  becomes  one,  radix 
four  digit;  a  trio  of  bits,  a  radix  eight  (octal)  digit. 

In  the  literature  of  arithmetic  unit  design,  one  finds  re- 
ferences to  such  techniques  as  inspection  of  bits  "two  at  a  time,"  or 
perhaps  "  generation  of  several  quotient  bits  simultaneously".   In 
this  report  such  techniques  would  be  described  in  terms  of  higher  radix 
arithmetic . 


2 .1   The  Recursive  Relationship 

Digital  division  as  implemented  in  an  electronic  computer 
consists  of  preliminary  operations,  i.e.,  normalization,  a  recursive 
process,  and  a  terminal  operation:,;  i.e.,  changing  the  form  of  the 
remainder.  Although  preliminary  and  terminal  operations  vary  from 
machine  to  machine, they  generally  consume  much  less  of  the  execution 
time  than  the  recursive  operations.   For  restoring,  non-restoring,  and 
the  SRT  division  scheme  to  be  described  in  this  report,  this  recursive 
relationship  is  defined  by 


p._=rp.  -q._d  (2.1.1) 


where  the  symbols  are  defined  as  follows: 

j  =  the  recursive  index  =  0,  1,  ...  m-1 

th 
p .  =  the  partial  remainder  used  in  the  j    cycle 
J 

p  =  the  dividend 
o 

p   =  the  remainder 
m 

q.  =  the  j    quotient  digit  in  which  the  quotient  is  of  the  form 
J 


q0  A  9lq2  • • •  qm 


L 


radix  point 


m  =  the  number  of  digits,  radix  r,  in  the  quotient 
d  =  the  divisor 
r  =  the  radix 


This  relationship  and  the  symbols  as  defined  will  be  used 
throughout  this  report.   The  relationship  is  used  specifically  in  the 
development  of  range  restrictions  on  the  partial  remainders  in  Section 
2.3. 

Although  not  germane  to  the  theory  of  SRT  division,  it  is 
interesting  to  note  in  passing  that  this  relation  points  to  possibilities 
for  accelerating  the  execution  of  division.   Verbally,  the  equation  says 

that  each  partial  remainder  must  be  multiplied  by  the  radix  (rp.),  i.e. 

J 

shifted  left  one  digital  position  and  that  the  selected  quotient  digit 
must  then  be  multiplied  by  the  divisor  (q.    d)  and  subtracted  from  this 
shifted  partial  remainder.   The  division  process  will  thus  be  accelerated 
if  the  shift  and/ or  the  subtraction  time  is  decreased.   In  practice,  all 
values  of  q    d  are  stored  in  registers  or  are  readily  available  via 
shift  gates  from  the  register  containing  the  divisor.   The  rapid  forma- 
tion of  q .    d  thus  reduces  to  minimizing  the  necessity  for  forming 
awkward  multiples  requiring  an  addition,  and  to  accelerating  the  selec- 
tion of  q .    d  at  the  divisor  input  to  the  adder/ subtractor . 

Secondly,  note  that  the  recursive  index,  j,  is  implicitly  an 
inverse  function  of  the  radix.   When  actually  implemented  on  a  machine, 
digits  of  a  higher  radix  than  two  are  represented  by  two  or  more  binary 
bits.   A  string  of  £   binary  digits  (bits)  is  equivalent  to  £/2   radix 

four  digits.   In  general  for  I   bits  of  radix  two,  there  corresponds 

I  n 

m  = digits  of  radix  r,  where  for  practical  cases,  r  =  2  , 

log2r    B  r  v  > 

n  =  integer  >  0.   Thus  to  produce  a  quotient  of  given  precision,  the 
number  of  iterations  required,  and,  concomitantly,  the  execution  time 
is  decreased  as  the  radix  is  increased. 


2 .2   The  Representation  of  Quotient  Digits 

As  noted  in  the  last  section,  the  use  of  a  higher  radix  reduces 
the  number  of  cycles  required  to  perform  a  division  of  given  precision. 
The  implementation  of  such  a  scheme  may,  however,  be  costly,  and  costlier 
still  if  quotient  digits  are  represented  as  they  are  in  manual  methods  or 
machine  restoring  division.   In  these  cases  quotient  digits  have  the 
values  0,  1,  2,  ...  r-1.   With  the'  tadix,  )x,    equal  four  the  possible 
digit  values  are  0,  1,  2,  and  3*   A  radix  four  restoring  division  there- 
fore requires  that  multiples  of  1,  2,  and  3  times  the  divisor  be  available 
for  subtraction  from  the  partial  remainder.   The  1  times  is  of  course 
readily  available,  the  2  times  is  formed  merely  by  shifting  left  one 
binary  position,  the  3  times  multiple,  however,  requires  extra  time 
and/ or  hardware.   It  may  be  formed  by  a  tripler  circuit  or  by  addition 
of  1  times  and  2  times  the  divisor  which  is  then  stored  in  an  auxiliary 
register.   For  radix  eight,  multiples  of  3>  5,  and  7  times  the  divisor 
must  be  computed  and  stored. 

With  SRT  division  the  problem  of  forming  divisor  multiples  is 
mitigated  by  using  both  plus  and  minus  quotient  digit  values.   The 
quotient  digits  are  of  the  form  -n,  -(n-l),  ...  -1,  0,  1,  . . .  n,  where 
n  is  an  integer  such  that  1/2 (r-1 )  ^  n  _£.r-l.  Within  this  range  the 
actual  choice  of  n  for  a  given  r  is  largely  a  function  of  design  de- 
tails.  The  choice  is  considered  further  in  Section  2.6. 

The  necessity  for  the  range  restriction  is  as  follows'.  At 
least  r  unique  digits  are  required  to  represent  a  number,  radix  r.   In 
the  representation  introduced  above,  there  are  2n+l  unique  digitc, 


thus  the  requirement  2n+l  7"r.   (->n  ^he  c^her  hand,  for  radix  r,  the 
maximum  value  of  a  quotient  digit,  n,  should  not  be  greater  than  the 
value  of  the  maximum  digit  representable,  thus  n  ^  r-1.   Combining  these 
two  inequalities  yields  the  restriction  stated  above. 

With  plus  and  minus  quotient  digits,  a  higher  radix  division 
may  be  implemented  with  fewer  awkward  multiples  of  the  divisor.   Now 
the  quotient  digits  for  a  radix  h   division  are  -2,  -1,  0,  +1,  42.   All 
the  necessary  multiples  of  the  divisor  may  be  formed  by  shifting  and 
complementation  and  require  no  auxiliary  registers. 

The  second,  but  probably  more  significant  consequence  of  this 
representation  of  quotient  digits  is  that  it  introduces  redundancy  into 
the  representation  of  the  quotient.   If  2n  7  -r-1,  then  there  are  more 
symbols  available  to  represent  a  number  than  actually  necessary.  '■'  1 
pome  numerical  values  may  therefore  be  represented  in  more  than  one 
form.  For  example,  with  r  =  k,    n  -  2,  and  with   representing  negation, 
the  number  6  could  be  represented  as  12,  or  22.   As  explained  in  the 
next  sections,  this  redundancy  permits  less  precision  in  comparing  the 
divisor  and  partial  remainder  in  selecting  a  quotient  digit.   This 
statement  seems  intuitively  correct  since  without  redundancy,  each 
quotient  digit  may  be  represented  only  one  way  and  thus  must  be  se- 
lected precisely.   With  redundancy,  the  quotient  digit,  thus  the 
comparison  of  divisor  and  partial  remainder,  need  not  be  precise. 
This  non-unique  representation  does,  however,  complicate  the  division 
in  that  the  redundant  form  must  eventually  be  converted  to  a  conven- 
tional representation. 
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2.3  Range  Restrictions 

With  the  quotient  representation  now  defined,  consider  the 

derivation  of  range  restrictions  on  the  partial  reminders.   Recall 

from  the  manual  execution  of  a  division  that  in  determining  whether  a 

quotient  digit  is  correct  or  not,  one  is  essentially  applying  the 

restriction  that  0  <  p.in  <  d,  where  p....  is  the  result  of  the  sub- 

~  J+l  3+1 

th 
traction  of  q .    times  the  divisor  from  the  j  '  partial  remainder.   If 

p.,,  is  not  within  this  range  then  q.,-,  is  changed  until  it  is.   For  non- 
restoring  division,  negative  partial  remainders  and  negative  quotient 
digits  are  allowable,  thus  the  range  restriction  is  |p.,,|   _<_|  d  |  .   It 
seems  reasonable,  therefore,  to  hypothesize  other  division  techniques 
for  which  lp.,,1   <  k  |  d  |,  and  which  utilize  the  quotient  digit  repre- 
sentation introduced  in  the  last  section.   The  upper  limit  on  k  will  be  1, 
The  lower  limit,  although  not  yet  obvious,  is  1/2,  thus  1/2  <  k  <  1. 

To  show  that  this  is  in  fact  the  case,  first  reconsider  the 
recursive  relationship  described  in  Section  2.1  and  restated  below. 

Pj+1  -  rp.  -  q.+1  d  (2.3.1) 

th 
After  p.,,  is  formed  on  the  j  k    cycle,  it  is  multiplied  by 

the  radix  r  (shifted  left);  j  is  increased  by  one  and  becomes  rp.  of 

the  present  cycle.   Since  lp.+1l  <.  kd,  it  follows  p.  must  obey  the 

same  restrictions,  i.c 

r   |pj  I  <   rk  |d  |  (2.3.2) 


Substituting  2.3.1   into  2.3.2  yields 


-kd  <  rp      -   qJ+14,  kd  (2.3-3) 


At  this  point  the  divisor  is  assumed  to  be  normalized,  ice., 

restricted  to  the  range  1/2  <  d  z_l.   Furthermore,  (2.3..I)  is  normalized 

with  respect  to  the  divisor  and  rewritten  letting  z.  =  p./d  and 

J    / 

7  .  .  =  p  .  ,  /  d . 
j  +1   *  j  +17 


zJ+1  =  rz.  -  q  (2.3.1*) 


Equation  (2.3«M  may  be  interpreted  graphically  as  a  plot  of 

z.  .  versus  rz  .  with  the  quotient  digit,  q.  n  as  a  parameter.   Such  a 
J+l  J  4  &      '    4J+1      * 

representation  shall  be  called  a  z  -  z   plot,,   Recall  that  the  quotient 
digits  assume  values  -n,  -(n-l),  .  ..,  -1,  0,  +1,  .  .  . ,  n.   Figure  2  is 
such  a  graph.   To  facilitate  discussion,  each  plot  corresponding  to  a 
different  quotient  digit  is  called  a  q-line. 

The   goal  of  this  section  is  to  demonstrate  that  a  correct 
division  procedure  exists  which  incorporates  the  above  range  restric- 
tions and  quotient  representation .   This  existence  is  substantiated 

if  for  each  value  of  rz .  in  the  allowed  range  there  corresponds  a 

J 

quotient  digit  and  a  z.  ,,  also  in  their  allowed  ranges.   In  terms  of 

J+l 

Figure  2,  this  means  that  for  any  point  on  the  rz .  axis  such  that 

-rk  <  rz .  <   rk,  one  must  be  able  to  move  on  a  line  segment  normal  to 

the  rz  .  axis  and  interesect  a  q-line  at  a  point  corresponding  to  a 
J 

z.  ,  within  the  range  -k  ^  z  .  n  <£  k.   This  allowed  range  is  enclosed 
J+l  -  J+l  - 

between  the  lines  z.  .  =  k  and  z.  ,  =  -k  in  Figure  2. 

J+l  J+l 
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To  satisfy  the  foregoing  requirements,  the  maximum  value  of 

rz  ,  i.e.  rk,  must  occur  at  the  intersection  of  z.  ,  =  k  and  the  q-line, 

z.  ,  -  rz .  -n.   Similarly,  the  minimum  value  must  occur  at  the  inter- 
J+l     J 

section  of  z.  n  =  -k  and  the  q-line,  z.  n  =  -rz .  +  n.   These  bounds  on 
J+l  J+l      j 

rz  .  are  indicated  by  the  dashed  vertical  lines  of  Figure  2. 

Figure  2  now  points  to  the  value  of  k  in  terms  of  r  and  n. 

At  the  upper  right  vertex  of  the  bounding  rectangle,  z.  ,  =  k  =  rz .  -  n. 

J+l         J 

But  since  rz  .  =  rk, 
3 


k  =  ^  (2.3^5) 


The  division  is  now  characterized  by  tangible  parameters,  namely  the 

radix  and  the  maximum  value  of  quotient  digits.   Combining  (2. 3°  5) 

r-1 
with  the  restriction  on  n,    -_—  c   n  *-      r-1,  verifies  the  statement 

at  the  beginning  of  this  section,  1/2  £k  ^.1. 


2 ,h     Redundancy  in  the  Quotient  Representation 

Section  2,2  indicated  that  the  quotient  digit  representation 

of  SRT  division  introduces  redundancy  into  the  quotient .   This  fact  is 

also  manifested  in  Figure  2  in  the  regions  on  the  rz .  axis  for  which 

J 

either  one  of  two  q-lines  may  be  legitimately  selected.   For  example, 
at  point  A  one  may  move  vertically  upward  to  the  q .   =  0  line  or 
downward  to  the  q.    =  +1  line.   In  either  case  the  quotient  digit  is 
correct.   Figure  3>    a  specific  case  of  Figure  2,  testifies  to  the  fact 
that  this  freedom  of  choice  is  not  merely  the  result  of  an  inaccurately 
drawn  graph.   Here  r  -  k,    n  =  2.   The  vertical  dashed  lines  define  the 

overlap  regions. 

12 
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The  production  of  a  redundant  quotient  requires  extra  hard- 
ware and  perhaps  time,  to  convert  it  to  a  conventional  binary  represen- 
tation acceptable  by  programmers  and  other  sections  of  a  machine. 
This  conversion  is  discussed  at  greater  length  in  Section  2.7.   The 
conclusion  of  the  section  is  that  the  positive  consequences  of  a 
freedom  in  quotient  digit  selection  overshadow  the  cost  of  conversion. 
With  no  redundancy,  the  divisor  and  the  shifted  partial  remainder  must 
be  compared  (usually  by  subtraction)  to  the  full  precision  defined  for 
the  machine o   With  redundancy,  the  designer  is  at  liberty  to  inspect 
fewer  bits  of  the  divisor  and  shifted  partial  remainder  than  define 
full  precision.   Handling  fewer  bits  may  save  time  and  hardware: 
these  ramifications  are  explored  further  in  the  chapter  concerning 

implementation.   In  Figure  3>  for  example,  a  correct  quotient  digit  is 

rp  . 
selected  knowing  rz  .=  — — "-  to  a  precision  only  great  enough  to  contain 

it  within  an  overlap  region.   Exactly  what  precision  is  required  for  a 
given  value  of  r  and  n  is  the  subject  of  the  next  section. 

In  terms  of  z  -  z  plots  such  as  Figures  2  and  3>  the  redun- 
dancy is  proportional  to  the  width  of  the  overlap  regions.  The  width 
of  this  region  in  terms  of  n  and  r  is  found  as  follows r   Consider  two 

adjacent  lines  of  Figure  2,    i.e.,  z.    =  rz  -i  and  z '.    =  rz .  -  (i-l). 

J  +1     J  ■       J  +1     3 

n 
The  overlap,  A  rz.  is  the  difference  between  rz .  for  z.  ,  =  — -  and 
J  j      j+1    r-1 

rz .  for  z'    =  ; —   .    Solving  for  this  difference  yields 

j      j 41    r-1 

A  rz  .  =  — — —  4  1.   The  ratio  — =-  is  therefore  a  measure  of  redun- 
j    r-1  r-1 

dancy . 


Ik 


As  redundancy  (width  of  overlap  region)  is  increased,  the 
required  precision  of  inspection  of  divisor  and  partial  remainder,  and 
thus  hopefully  the  execution  time,  is  decreased.   It,  therefore,  appears 
that  for  a  given  r,  n  should  be  as  large  as  possible,  i.e.,  n  should 
equal  r-1.   Such  a  choice  may  not  be  practical,  however,  since  n  =  h, 
requires  the  ability  to  form  h  multiples  of  the  divisor.   The  choice 
of  n  is  therefore  bound  up  in  the  usual  trade  off  between  time  and 
hardware . 

2,5   The  P-D  Plot 

Now  consider  another  graphical  representation  of  the  division 

procedure.   This  construction,  suggested  by  C .  V.  Freiman  of  the  IBM 

[51 
Corporation     is  useful  in  further  describing  SRT  division  and  in 

computing  the  required  precision  of  inspection  of  the  divisor  and 

shifted  partial  remainder.   The  basis  for  the  plot  is  the  recursive 

relationship 


Vi "  rpj  -  Vi d  (2-la) 


as  described  in  Section  2.1  together  with  the  range  restriction 


V1 


r-1 


developed  in  Section  2.3-   The  figure  is  thus  essentially  a  plot  of 
partial  remainder  versus  divisor  values  and  therefore  in  this  report 
shall  be  referred  to  as  a  P-D  plot. 
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Solving  the  recursive  relationship  for  rp  .  yields 


rp.  =  p.   +  q.   d.  (2.5.1) 

0    0+1    0+1  v 


For  a  fixed  quotient  digit,  the  upper  limit  of  rp .  as  a  function  of 

J 

the  divisor,  d  occurs  when  p .  .  is  maximum,  i.e.  when 

J  +1 


11      A 

*j+l   r-1 


thus 


rp,  _  -  l-rr     +  q,Al  1  d.  (2.5-2) 


j  max   I  r-1     0+1 


Likewise,  the  lower  limit  occurs  with  p.  _  =  - — —  d,  thus 
'  0+1    r~l 


rp.   .   =  (  -^r     +  q-  Jd-  (2.5.3) 

Fj  nun    v  r-1     0+1 


These  linear  equations  may  be  plotted  as  functions  of  d  with  q.,-,  as 

J 

a  parameter  ranging  from  -n  to  +n  in  steps  of  1.   The  area  between 

rp .     and  rp .   .   for  a  given  q.  ,  =  i  will  be  denoted  the  q(i)  area 
j  max      ^j  mm       to      0+1  

The  division  procedure  is  now  determined.   A  given  value  of 

th 
divisor,  d  and  the  j    shifted  partial  remainder  will  specify  a  point 

in  a  q(i)  area.   The  digit  i  will  be  the  value  of  the  next  quotient 

digit  q.   which  in  turn  is  used  in  forming  the  next  partial  remainder. 
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In  this  representation  the  redundancy  is  manifested  as  overlapping  of  the 

q(i)  regions,  i.e.  some  pairs  of  d  and  rp .  will  specify  a  point  for 

which  either  q..  =  i  or  q .  ,  =  i  -  1  is  a  valid  choice* 
4J+1         J+l 

Figure  k   is  an  example  of  a  P-D  plot  for  a  division  with 

r  =  k,    n  =  2.   The  equations  for  the  lines  plotted,  2l,  2,  etc.,  are 

given  in  Table  1.   The  region  for  which  q.  .  -  2  is  a  valid  choice,  i.e. 

the  q(2)  area. is  between  lines  2'  and  2;  the  q(l)  area  is  between 

lines  1'  and  1,  and  so  forth.   Note  the  overlap  between  q(i)  areas, 

for  example,  the  region  between  line  1'  and  2  in  which  either  the  choice 

q.  ,  =  1  or  q  .  _,  --  2  is  correct.   Note  further  that  the  figure  is 
J+l         J+l 

symmetric  about  both  axes. 

On  the  right  half  of  Figure  k    (the  same  may  be  done  on  the 
left),  "steps"  have  been  drawn  within  the  overlap  of  the  q(i)  regions. 
The  width  of  a  "tread"  (constant  rp,,  d  varying)  defines  a  divisor 

interval,  the  value  of  rp .  for  each  tread  defines  a  comparison  con- 

_ __>  ^ 

stant,  the  distance  between  comparison  constants  defines  a  partial 
remainder  interval.   Phrased  in  this  terminology,  division  consists  of 
locating  a  given  divisor  value  within  the  appropriate  divisor  interval, 
locating  the  shifted  partial  remainder  within  the  appropriate  interval 
(using  comparison  constants),  and  selecting  a  value  of  q    enclosed 
by  the  intersection  of  the  boundaries  of  these  intervals.   Since  a 
divisor  and  partial  remainder  must  be  located  only  to  within  an 
interval,  they  need  not  be  inspected  to  full  precision  in  selecting  a 
correct  quotient  digit.   Here  is  where  the  redundancy  pays  dividends. 


IT 


CsJ 
ii 
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n 


X 
H 

5= 
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rp  .   -  +       .    d 

J        -  r-1 

+  Vi d 

r  =  k 

Vl 

pd+l 

2 

2/3   d 

2 

-2/3   d 

1 

2/3   d 

1 

-2/3   d 

0 

2/3   d 

0 

-2/3   d 

1 

2/3   d 

1 

-2/3   d 

2 

2/3   d 

2 

-2/3   d 

quation 

rp.    = 

8/3  d 

V3  d 

5/3  d 

1/3  d 

2/3  d 

-2/3  d 

-1/3   d 

-5/3  d 

-V3  d 

-8/3  d 

Designation 
in  Figure  3 

2' 

2 

1' 

1 

0' 

0 

I' 

I 

2- 

2 

Table  1.   Equations  Defining  the  Regions  of  Figure  h. 

Techniques  for  selecting  divisor  intervals  and  comparison  con- 
stants are  detailed  in  the  next  two  sections <   At  this  point,  however, 
we  shall  make  several  general  observations.   First,  as  we  shall  soon 
discover,  the  comparison  constants  are  compared  with  the  high  order  N 
bits  of  the  shifted  partial  remainder  and,  similarly,  the  end  points 
of  the  divisor  intervals  are  compared  with  the  N  high  order  bits  of 
the  divisor.   The  comparison  constants  and  end  point  of  the  divisor 
intervals  should  therefore  be  numbers  which  are  representable  with 

N  and  KL  bits,  respectively.   The  choices  illustrated  in  Figure  h 
p      d     '  D 

which  maximized  the  width  of  the  divisor  intervals  do  not  meet  this 
requirement. 
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In  Figure  5>  however,  more  practical  choices  are  shown.   The 
dashed  lines  represent  the  theoretical  choices  used  in  Figure  k.      Now, 
although  the  number  of  steps  has  been  increased,  the  boundaries  fall 
at  points  easily  representable  in  binary  notation „   Note  that  inspec- 
tion of  k   bits  plus  sign  of  the  partial  remainder  and  divisor  is 
sufficient  to  locate  the  correct  choice  of  quotient  digit. 

The  second  observation  is  that  the  choice  of  divisor  inter- 
vals and  comparison  constants  is  bound  up  with  the  required  precision 
of  inspection  of  the  partial  remainder  and  divisor;  if,  for  example, 
the  divisor  intervals  widths  are  increased,  the  required  precision 
of  divisor  inspection,  (number  of  bits)  may  be  decreased.   Further- 
more, the  maximum  precision  of  inspection  of  the  divisor  is  determined 
by  the  divisor  interval  of  smallest  width.   By  inspection  of  Figure  5> 
the  reader  might  guess  where  this  step  is,  but,  we  shall  now  locate 
it  analytically.   The  result  of  this  derivation  will  be  useful  in  the 
next  sections. 

The  length  of  a  divisor  interval  is  limited  by  the  boundaries 
of  the  overlap  region.   The  maximum  precision  of  inspection  is  required 
where  the  divisor  interval  is  minimum.   To  determine  where  this 
minimum  divisor  interval  occurs  consider  the  detail  of  the  overlap 
of  the  q(i)  and  q(i-l)  regions  shown  in  Figure  6. 


For  a  given  value  of  rp.,  the  maximum  width  of  a  divisor 

J 


interval  is 
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rp: 


P:    ;  [  n/(r-l)  +  i-l  ]  d 


p.   =  [-n/(r-|)    +  i  ]  d 


FIGURE     6.       DETAIL    OF     A      P-D     PLOT    OVERLAP    REGION 
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dividend  and  a  13  bit  divisor.   The  results  of  this  limited  precision 
division  (eight  bits)  are  returned  to  the  full  precision  mechanism  as 
part  of  the  full  precision  quotient  and  are   used  in  forming  the  next 
full  precision  partial  remainder .   Note  that  the  number  defining  full 
precision  may  be  changed  in  discrete  steps  by  changing  the  number  of 
"calls"  to  the  model  division.   Furthermore.,  the  model  division  scheme 
may  be  quite  different  from  that  of  the  full  precision  division. 

For  purposes  of  computing  costs  of  quotient  selection,  we 
shall  consider  two  classes  of  model  division  procedures.   The  first 
will  be  those  involving  the  use  of  an  auxilary  arithmetic  unit  and 
employing  addition  and/ or  subtraction  in  forming  the  quotient  digits. 
Examples  of  schemes  in  this  class  include  a  radix  four  SRT  division 

performed  in  the  exponent  arithmetic  unit  or  the  procedure  suggested 

[9] 
by  Wallace     which  is  logically  equivalent  to  forming  the  approxi- 
mate reciprocal  of  the  divisor  and  multiplying  by  the  partial  remainder 
This  class  will  be  referred  to  as  arithmetic  models. 

The  second  class  consists  of  those  methods  which  are  the 
logical  equivalent  of  a  table  look-up.   This  technique  may  be  viewed 
as  the  direct  implementation  of  a  P-D  plot,  i.e.,  decoding  the  divisor 
interval,  the  partial  remainder  interval  and  producing  the  quotient 
digit  indicated  by  their  intersection.   This  class  will  be  referred 
to  as  table  look-up  models. 

Before  considering  these  two  type  models  in  further  detail, 
let  us  state  more  precisely  the  conditions  which  must  be  obtained  in 
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the  choice  of  model  division  and  precision  of  inspection.   Let 

m    =  the  number  of  bits  to  the  right  of  the  radix  point 

of  divisor  and  dividend. 

/^ 

rp .   =  the  truncated  version  of  the  shifted  partial  re- 
J 

mainder . 

e  =     the  number  of  bits  to  the  right  of  the  radix  point 

in  rp  .  . 
J 

Ap   =+(2-2   )  ^  +  2   ,  the  uncertainty  in  rp  .  . 

d    =  the  truncated  version  of  the  divisor. 

5    =  the  number  of  bits  to  the  right  of  the  radix  point 

in  d. 

Ad   =+(2    -2   )^r  +  2   ,  the  uncertainty  in  d. 

The  following  cost  criterion  summarizes  the  requirements  on 

the  quotient  selection  mechanism,  Ad  and  Ap. 

Cost  criterion:   Given  the  approximations  rp .  +  Ap  and 

J 

d  +  Ad,  the  integer  result  of  rp  ./d  =  i  performed  in  the  model  must 

J 

be  such  that  on  the  appropriate  P-D  plot,  the  rectangle  defined  by 

(d  +  Ad,  rp .  +  Ap)  is  entirely  within  the  q(i)  region, 

J 

2.6.2   Cost  Determination  for  an  Arithmetic  Model 

We  first  consider  the  determination  of  the  cost  for  a 
division  using  an  arithmetic  model.   In  this  case  rp .  and  d  are 

J 

presented  to  a  limited  precision  arithmetic  unit  and  the  division 
carried  out  to  produce  a  rounded  integer  quotient.   If  the  bit  posi- 
tion to  the  right  of  the  radix  point  in  the  model  is  "1",  the  integer 
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portion  is  increased  by  one  and  truncated,  otherwise  the  result  is 
merely  truncated.   This  rounding  is  necessary  if  the  cost  criterion  is 
to  hold  for  an  arithmetic  model. 

Equation  2„5.^  indicated  that  maximum  precision  is  required 
in  the  overlap  of  the  q(n)  and  q(n-l)  regions  in  the  vicinity  of 
d  =  l/2.   The  precision  determined  here  will  "be  sufficient  for  any 
other  region  of  the  P-D  plot.   Figure  7  is  a  detail  of  this  region. 

Two  additional  factors  must  now  he  considered:   a  redundantly 
represented  partial  remainder  and  a  negative  divisor.   As  illustrated 
in  the  next  chapter,  a  division  scheme  which  meshes  well  with  multi- 
plication must  cope  with  redundantly  represented  partial  remainders. 
One  consequence  of  the  representation  is  that  the  truncation  error 
(Ap)  attributable  to  considering  only  a  few  higher  order  bits  of  the 
partial  remainder  may  be  either  positive  or  negative.   When  a  negative 
(2's  complement)  divisor  is  permitted,  truncation  error  may  also  be 
negative . 

In  the  divisor  interval  l/2  +  Ad,  the  dividing  line  between 

the  selection  of  q  =  n  and  q  =  n-1  is  rp .  =  l/2(n  -  l/2)  since  rp  ./d  = 

J  J 

2  x  1/2 (n  -  1/2)  =  n  -  1/2  which  must  be  rounded  to  n.   For  the  cost 

criterion  to  hold,  the  rectangle  (l/2  +  Ad,  l/2(n  -  1/2)  +  Ap)  must 

not  extend  below  the  bottom  of  the  overlap  region  defined  by  rp  .  = 

J 

(n  -  2/3)d.   Such  a  rectangle  is  indicated  by  the  dashed  lines  in 
Figure  7.   Since  this  rectangle  is  not  unique,  there  is  some  avail- 
able trade  off  between  Ap  and  Ad.   To  achieve  more  quantitative 
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rp.  =  (n-  1/3)  d 

J 


rPj  =  (n-2/3)d 


r  pj  =  1/2  (n-l/2) 


—  d 


FIGURE   7. 


COST  CALCULATION  FROM  P-D  PLOT 
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results,  we  now  limit  the  analysis  to  a  special  but  useful  case:   that 

2k 
in  which  the  radix  is  of  the  form  r  =  2  •    where  k  is  a  positive 

(non-zero)  integer. 

2k 
A  division  with  r  =  2   may  be  implemented  with  a  cascade  of 

k  adder/ subtractors  with  multiples  of  1  times  and  2  times  the  divisor 

available  to  the  first  stage  of  the  cascade,  k    times  and  8  times  to  the 

second,  and  so  forth  through  2      times  and  2  ;  times  available 

to  the  k   stage.   In  this  case,  n,  the  largest  multiple  of  the 

divisor  which  may  be  formed,  is  the  sum  of  the  largest  multiple  which 

may  be  formed  at  each  stage  in  the  cascadej  i.e.  n  =  2  +  8  .  .  .+  2  - 

Furthermore,  the  sum  of  this  geometric  series  is  — —  =  2/3.   Thus  we 

r-1 

2k 
shall  consider  the  case  r  =  2   ,  n  =  2/3(r-l). 

For  practical  implementation,  the  rectangular  region  defined 

horizontally  by  Ap  will  be  symmetric  about  d  =  1/2  and  rp .  =  l/2(n-l/2) 

Referring  to  Figure  7,  note  that  Ad  must  be  smaller  than  the  smaller 

of  Adn     and  Ad„     .   The  following  demonstrates  that  Ad^<-Adn 

.  1  max       2  max  °  2    1  max 


AdQ     =  1/2  (  n  -  y,2        -   l)  (2.6.1) 

2  max       \n  -  2/3 


Ad      =  l/2 (-n  -  1/2    +  1 

1  max    '  V  n  -  1/3 


Ad,      -AdQ     .1-  %-   n  +  l/k  (2.6.2) 

1  max     2  max        n2  .  n  +  2/9 


Since 


2 
n  -  n  +  l/k 

n2  -  n  +  2/9 


>  1 
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Ad      -  Ad0     ^  0 
1  max     2  max 


Ad.,     <  Ad0  (2.6.3) 

].  max     2  ma> 


Thus  choosing  Ad  ^_Ad_,     will  insure  that  the  rectangle  will  fit 
—   1  max 

horizontally. 

Similarly 


Apx   =    (n   -   l/3)dl    -   l/2(n   -   l/2)  (2.6.1+) 


Ap2   =   -    (n   -  2/3)d2    +  1/2 (n   -   l/2) 


Apx   -  Ap2   =    (n   -   l/3)d1   +  (n   -  2/3)d2    -    (n   -  l/2) 


(2.6.5) 


let 


d     =  l/2    -  Ad 


d     =  l/2    +  Ad  (2.6.6) 


Substituting  (2.6.6)  into  (2.6.5)  yields 


Apx  -  Ap2  =  —  ^  0 
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thus 


Ap1  ^  Ap2  (2.6.7) 


As  implied,  earlier,  if  we  are  certain  that  rp .  =  1/2  (n  -  l/2) 

J 

will  produce  the  quotient  selection,  q    =  n,  then  Ap  <  Ap  will  be 
sufficient.   If  we  cannot  guarantee  this,  then  Ap  <  Ap  must  hold. 

We  shall  adopt  the  latter,  more  cautious  approach.   If  we 
selected  the  former,  then  the  (n  -  l/3)  term  in  equation  2.6.13  would 
be  replaced  by  (n  -  2/3).   The  results  in  Table  2,  however,  will  be 
the  same . 

Recalling  that  Ad  -  2   we  want 


2"5  <  Adn  (2.6.8) 

—   1  max 


which  from  2.6.1  becomes 


2"5  <l/2  (  2^|  -  1  )  (2.6.9) 


where 


n  =  2/3(22k  -  1) 


Let 


J.(x)  =  x  if  x  is  an  integer. 

=  next  larger  integer  if  x  is  not  an  integer 

31 


The  minimum  value  of  6  is  therefore 


min 


log2  (1/2(1 


1/2 


173 


(2.6.10) 


Possible  values  of  6  are  thus 


6  =  6  ■    )   b      .   +  1,  . . .  m 

nun    min 


(2.6.11) 


Similarly  since  Ap  =  2  ,    combining  2.6-7  and  2.6.U  yields 


2"£  zil/12  -  2"8(n  -  1/3) 


(2.6.12) 


and  thus 


G  =  -I 


log2  L/12    -  2_5(n  -  1/3)) 


where  6  is  defined  by  2.6.11 
Now  let 


(2.6.13) 


Nn  =  number  of  bits  of  d  =  6 
d 


N  =  number  of  bits  of  rp .  =  €  +  2k 
P  J 


Note  also  that  the  sign  of  d  and  rp .  must  be  known  to  model.   Table 
2  summarizes  the  results  of  equations  2. 6. 11  and  2.6.13  for  k  =  1,  2, 
3}    *+•   Note  that  €  approaches  a  lower  limit  of  h   when  the  l/l2  term 
in  2.6.13  becomes  dominant. 
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mm 


k       256        170 


6567 
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m  4  m 


0        
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7 

7 

7 

11 

8 

5 

8 

9 

9 

14 

9 

8 

10 

i4 

10 

8 
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9 

9 

9 

15 
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10 

11 

11 

k 

11 

10 

12 
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12 

10 
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5  .   = 
mm 

11 

11 

11 

19 

12 

5 

12 

13 

13 

k 

13 

12 

Ik 

k 

Ik 

12 

m  k  m  12 


Table  2.      Costs   for  Arithmetic  Models 
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Thus  it  appears  there  are  three  feasible  cases  for  which  the 

cost  of  inspection  is  as  follows: 

Case  1 

N  =  4k  +  3 
P 

N,  =  2k  +  3 

d 

Case  2 


N  =  2k  +  5 
P 

N  =  2k  +  k 
a 


Case  3 

N  =  2k  +  k 
P 

N  =  2k  +  5 

Case  three  would  probably  be  the  most  practical  case  to 

implement  since  N  is  minimum.   N  bits  of  the  redundantly  represented 
P  P 

partial  remainder  must  be  converted  into  conventional  form  before  each 
model  division.   Since  this  assimilation  is  essentially  a  serial 
process,  the  assimilation  time  is  directly  proportional  to  N  . 

2.6.3   Cost  Determination  for  a  Table  Look-Up  Model 

This  class  of  model  is  a  logical  implementation  of  the  P-D 
diagram.   In  its  most  brute  force  form,  this  model  may  be  viewed  as 
a  grid  or  matrix  with  vertical  lines  which  are  the  outputs  of  decoders 

applied  to  d  and  with  the  horizontal  lines  which  are  the  outputs  of 

s\ 

the  decoders  applied  to  rp  .  .   At  each  intersection  of  the  lines  is 

J 

and  AND  gate  with  one  input  connected  to  the  vertical  line,  the  other 
to  the  horizontal  line.   Each  point  of  intersection  corresponds  to  a 


3^ 


quotient  digit  value,  i,  and  thus  the  output  of  each  AND  gate  is 

connected  to  the  input  of  the  appropriate  j#R  gate  the  true  output  of 

which  is  q.    =  i. 
3+1 

The  overlap  regions  are  divided  by  steps  as  discussed  in 

Section  2„5  such  that  the  cost  criterion  (Section  2.6.1)  will  hold  in 

all  intervals.   To  determine  the  required  N  and  N  in  this  case,  we 

p      d 

again  consider  the  worst  case  region  of  the  P'-D  plot  where  d  =  1/2 
and  between  q(n)  and  q(n-l)  as  shown  in  Figure  7- 

Again,  if  we  choose  the  dividing  line  between  q.    =  n  and 

q.  ,  =  n-1  to  be  at  1/2 (n  -  l/2 ) ,  then  the  calculations  of  Section 

j+1  '  ' 

2k 
2„6«2  also  hold  for  the  table  look-up  case  with  r  -  2   .   Recall, 

however,  that  we  generally  wish  to  minimize  N  since  this  will  reduce 

the  assimilation  time  in  forming  rp .  in  each  cycle.   We  can  accomplish 

J 

this  by  selecting  the  comparison  constants,  the  dividing  line  between 
choice  of  quotient  digit  values,  as  close  to  the  top  of  an  overlap 
region  as  possible. 

In  the  arithmetic  models,  the  comparison  constants  are 
implicit  in  the  model,  and  thus,  for  example,  we  had  no  choice  but 
to  use  l/2j[n  -  l/2)  in  the  cost  calculations.   In  the  present  case, 
however,  we  may  select  any  value  which  is  within  the  overlap  region 
and  an  integer  multiple  of  2 

The  value  of  1/2 (n  -  l/2)  is  always  an  exact  binary  number, 
specifically  a  number  with  a  fractional  part  of  3/^+»   The  distance 
from  1/2 (n  -  l/2)  to  the  upper  limit  of  the  overlap  region  along 
d  =  1/2  is  1/2  (n  -  1/3)  -  l/2(n  -  1/2)  •--  1/12.   This  means  that  the 
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largest  comparison  constant  we  may  choose  in  this  region  without 
increasing  e  to  be  greater  than  four  is  1/2 (n  -  l/2)  +  l/l6.   If  we 
design  the  logic  such  that  rp .  =  l/2(n  -  l/2)  +  l/l6  and  d  =  l/2 

J 

selects  q.  ,  =  n,  then  Ad  and  Ap  cost  calculations  are  as  follows: 
3  +1    ' 

In  this  case 


2"6  <Ad 

—    max 


2~e   <7/W  -  2"5(n  -  2/3) 


In  the  same  manner  as  that  outlined  in  the  last  section  we  obtain 

Table  3  and  the  three  cases. 

Case  1 

N  =  2k  +  1+ 
P 

N,  =  2k  +  3 
d 

Case  2 

N  =  2k  +  k 
P 

JL  =  2k  +  h 

d 

Case  3 

N  =  2k  +  3 
P 

NJ  =  2k  +  5 
d 

The  first  entry  N  =  k,    N  =  6  is  not  included  in  the  above 
d       p 

linear  equations  but  this  is  the  most  practical  case  for  k  =  1,  radix 
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1* 

11 

12 

h 

12 

12 

3 

13 

11 

170  6      .      =   11 

min 

12 
13 


m.  3  m  11 

Table   3.      Costs   for   Table   Look-Up  Models 

four.      By  comparison  with  the   results   of   Section  2.6.2,    note   that   for 
a  given  k,    a   case  may  be   found  for  which  a   table   look-up  model  re- 
quires fewer  bits   of   comparison  than  the   corresponding  arithmetic 
model. 
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2 .7   Quotient  Conversion 

The  quotient  developed  by  SRT  division  will  in  general  in- 
clude negative  digits  and  eventually  must  be  converted  to  a  conventional 
binary  form.   This  conversion  time  and  hardware  is  the  greater  part  of 
the  price  paid  for  the  accrued  advantages  of  redundancy. 

First  consider  a  specific  case:  conversion  of  a  result  pro- 
duced by  a  non-restoring  division.   Here  quotient  representation  is 
the  same  as  that  discussed  in  Section  2.2  except  that  zero  is  not  an 
allowable  digit.   The  algorithm  for  such  a  conversion  is  illustrated 
in  Figure  8.   This  conversion  may  be  performed  sequentially  as  the 
quotient  digits  are  generated,  and  thus  requires  no  additional  terminal 
operations.   The  digit  q    is  unchanged  if  it  is  positive,  otherwise 
it  is  replaced  by  r  +  q.  . ,  and  the  adjacent  higher  order  digit  q., 
decreased  by  1.   Note  that  since  zero  is  not  a  permissible  digit, 

there  is  no  requirement  for  a  borrow  propagation  in  decreasing  q.  by 

J 

1.   The  hardware  required  is  of  the  order  of  a  two  digit  subtractor. 

It  is  not  generally  possible,  however,  to  perform  SRT  divi- 
sion not  allowing  q  =  0.   Non-restoring  division  may  be  viewed  as  SRT 
division  with  n  =  r-1.   For  this  case,  the  q(0)  region  of  a  P-D  plot 
is  completely  overlapped  by  the  q(l)  and  q(-l)  regions.   The  quotient 
digit  value  q  =  0  may,  therefore,  be  eliminated  and  the  conversion 
consequently  simplified  to  that  of  Figure  8.   For  cases  of  SRT  di- 
vision with  n*cr-ljthe  q(0)  region  is  not  subsumed  by  other  regions 
and  thus  q  =  0  must  be  allowed  if  the  division  is  to  be  completely 
defined. 
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FIGURE       8.  QUOTIENT     CONVERSION     FOR     NON- RESTORING     DIVISION 
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With  the  possibility  of  q  =  0,  the  conversion  is  complicated: 
the  algorithm  of  Figure  8  is  no  longer  adequate,  for  now  the  difference 
q.  -  1  may  require  a  borrow  from  q.   .   Furthermore,  this  borrow  must 
propagate  to  the  left  until  it  encounters  a  non-zero  digit.   This 
potential  for  borrow  propagation  requires  that  the  equivalent  of  a 
full  precision  subtractor  be  available  to  the  quotient  register  if 
conversion  is  to  occur  as  the  quotient  digits  are  generated* 

Alternately,  the  full  precision  quotient  may  be  generated 
and  stored  in  the  redundant  form  and  then  converted  during  an  extra 
terminal  step.  A  high-speed  arithmetic  unit  frequently  employs  a 
redundant  representation  of  the  partial  product  during  multiplication, 
e.g.  carry-save  adders,  which  also  require  a  terminal  conversion.   One 
possibility,  then,  is  to  share  the  hardware  for  conversion  of  both 
products  and  quotients  ■>   The  sample  implementation  presented  in  the 
next  chapter  incorporates  this  approach. 
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3.      IMPLEMENTATION   OF   SRT  DIVISION 

3-0  Introduction 

Armed  with  the  theory  and  techniques  unfolded  in  the  last 
chapter,  now  consider  an  example  implementation  of  SRT  division-   This 
example  is  not  presented  as  a  detailed  construction  proposal,  "but  is 
rather  intended  to  contribute  the  following: 

1.  A  description  of  several  fairly  general  considerations 
for  implementing  digital  division  and  of  how  SRT  division 
meshes  within  these  considerations. 

2.  An  elaboration,  in  a  rather  concrete  way,  of  the  concept 
of  limited  precision  modeling. 

3.  A  notion  as  to  the  hardware  demands  and  operation  time 
of  functional  blocks  required  in  implementing  SRT 
division. 

Throughout  this  chapter,  it  is  assumed  that  the  designer  has 
already  made  the  decisions  as  to  the  speed  of  the  electronic  components 
he  will  use,  and  that  now  he  is  attempting  to  organize  these  components 
into  a  faster,  more  efficient  system. 

3=1  General  Considerations  for  Implementation 

Chapter  2  introduced  a  class  of  division  techniques  which 
appear  especially  suited  for  implementation  in  a  digital  machine. 
Having  accepted  this  premise  and  having  decided  to  tackle  SRT  division, 
the  designer  is  still  faced  with  many  decisions  and  dirty  design  details, 
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These  details  are  strongly  related  to  the  structure  of  the  allied  parts 
of  the  arithmetic  unit  and  to  such  real  life  questions  as  available 
logic,  speed  demands,  available  packaging  space,  and  to  a  large  extent 
to  the  price  the  designer  is  willing  to  pay  for  a  high-speed  divide. 
A  thorough  exploration  into  these  factors  is  well  beyond  the  scope  of 
this  paper,  however,  there  are  several  more  general  guidelines  which 
may  apply. 
3-1.1  Relative  Occurrence  of  Division 

The  first  guideline  emerges  from  the  observation  that  divi- 
sion is  usually  the  least  frequently  executed  of  the  basic  arithmetic 
operations:   add,  subtract,  multiply,  and  divide.   The  designers  of  the 

r6i 

IBM  STRETCH  computer     estimated  that  on  an  average,  out  of  l6  opera- 
tions of  a  general  purpose  computer,  the  relative  occurrence  by  opera- 
tion type  is  as  follows: 

1  division 

3  multiplications 

6  additions 

6   control  transfers 

These  figures  indicate  that  the  designer  should  pay  more  to 
accelerate  multiplication  than  division:   that  in  a  conflict  between 
accelerating  multiplication  and  division,  the  former  should  be  the 
victor. 
3.1.2  Acceleration  of  Division 

With  decreasing  hardware  costs,  increasing  packaging  density, 
and  demands  for  still  faster  arithmetic  units,  the  first  guideline  may 


k2 


not  be  as  significant  as  it  was  in  the  days  of  STRETCH.   Today  the 
designer  will  probably  aim  both  for  very  high-speed  multiply  and  divide. 
The  design  question  is  not  merely  how  to  implement  division,  but  rather, 
how  to  implement  high-speed  division,  or  yet  more  specifically,  high- 
speed SRT  division. 

The  next  guidelines,  therefore,  related  to  organizational 
factors  affecting  the  speed  of  execution  of  division,,   Of  course,  in 
selecting  the  SRT  method,  the  designer  has  already  seized  upon  the 
possibility  of  accelerating  execution  by  decreasing  the  precision  and 
thus  reducing  the  time  required  in  selecting  a  quotient  digit.   There 
are,  however,  other  possibilities  beyond  this  fundamental  decision. 

As  mentioned  in  Section  2.1,  the  recursive  relationship 
points  directly  to  four  possibilities  for  accelerating  division.   A 
fifth,  obvious,  but  important  factor  is  added  here.   These  possibilities 
are  as  follows: 

1.  Decrease  the  time  for  forming  rp  ,  i.e.  the  left 
shift  time. 

2.  Decrease  the  selection  time  for  multiples  of  the 
divisor  at  the  divisor  input  to  the  adder/ subtractor . 

3°   Decrease  the  add/ subtract  time. 

h.      Increase  the  radix  and  thus  decrease  the  number  of 

cycles  required  to  generate  a  quotient  of  specified 

precision. 
5°   Decrease  the  time  for  selecting  a  quotient  digit,  i.e. 

for  comparing  the  divisior  and  shifted  partial  remainder. 
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The  first  of  these  is  essentially  the  problem  of  minimizing 
the  number  of  logic  stage  delays  required  to  transfer  and  shift  the 
contents  of  the  secondary  rank  of  the  accumulator  back  to  the  primary 
rank. 

Similarly,  the  second  item  relates  primarily  to  minimizing 
control  delay  in  operating  a  shift  gate  once  a  quotient  digit  is 
selected. 

In  approaching  the  third  factor  of  this  list,  decreasing 
the  add/ subtract  time,  the  designer  is  likely  to  turn  to  a  carry/ 

borrow  save  type  unit  which  eliminates  propagation  until  a  terminal 
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step     .   This  is  a  standard  technique  in  implementing  multiplication, 

but  must  be  approached  cautiously  for  the  case  of  division. 

The  necessity  for  caution  arises  from  the  fact  that  such 
schemes  actually  introduce  redundancy  into  the  representation  of  a 
sum  or  difference  and  thus,  for  division,  produce  a  redundant  partial 
remainder.   As  mentioned  in  Section  2.5-2,  redundancy  in  the  partial 
remainder  complicates  the  quotient  selection  and,  for  a  practical 
scheme,  requires  that  at  least  part  of  the  partial  remainder  be 
converted  to  conventional  form  after  each  pass  through  the  subtractor (s) 

Increasing  the  radix,  although  it  does  decrease  the  number  of 
cycles  required,  also  carries  with  it  some  disadvantages.   For  a  fixed 
n  (the  upper  limit  of  a  quotient  digit)  an  increase  of  r  decreases  the 
redundancy  — —  and  thus  requires  either  greater  precision  in  selecting 
quotient  digits,  or  an  increase  of  n.   As  noted  earlier,  an  increase 
in  the  value  of  n  demands  the  availability  of  more  multiples  of  the 
divisor  and  thus  more  hardware. 
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The  fifth  factor  is  explored  further  in  Section  3  =  3  with 
reference  to  the  selection  of  the  model  division, 

Note  that  the  question  of  minimizing  control  step-up  time 
is  largely  beyond  the  scope  of  this  paper.   It  is,  however,  a  very 
real  and  related  problem  to  be  faced  in  accelerating  an  arithmetic 
process o   There  is  little  efficiency  in  building  a  system  which 
operates  faster  than  control  signals  can  service  it. 
3=1=3   Compatibility  of  Division  with  the  Multiplication  Scheme 

According  to  the  STRETCH  statistics  mentioned  in  Section 
3-1.1,  multiplications  occur  half  as  often  as  additions.   Multiplica- 
tion, however,  is  usually  executed  as  a  series  of  considerably  more 
than  two  additions  and  thus  requires  the  use  of  acceleration  techniques 
if  the  speed  of  multiplication  and  addition  are  to  be  compatible.   These 
techniques  essentially  reduce  to  the  first  four  of  those  mentioned  in 
Section  3  =  1=2  with  the  word  "divisor"  replaced  by  multiplicand',  "left 
shift"  replaced  by  "right  shift",  and  "quotient"  by  "product,"   Thus, 
at  least  to  a  first  approximation,  acceleration  of  multiplication  and 
division  are  compatible. 

A  high-speed  arithmetic  unit  usually  includes  a  substantial 
investment  in  hardware  to  accelerate  the  execution  of  multiplication. 
Hopefully,  much  of  this  investment  may  also  be  used  for  division. 

With  this  in  mind  and  accepting  the  premise  that  accelera- 
tion of  division  should  place  second  to  accelerated  multiplication, 
we  adopt  the  following  strategy:   design  a  high-speed  multiplication 
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scheme,  then  embed  division  within  it,   Although  not  the  ideal,  it  is, 
in  fact,  a  practical  strategy  which  has  been  used  in  arithmetic  unit 
design.   In  a  sense,  this  guideline  summarizes  the  guidelines  mentioned 
in  both  of  the  previous  sections. 

3-2  A  High-Speed  Multiplication  Scheme 

Having  adopted  the  design  strategy  "multiply  then  divide",  we 
must  now  propose  a  high-speed  multiplication  scheme  with  which  we  hope 
to  mesh  division.   The  description  of  the  scheme  will  necessarily  be  at 
the  block  diagram  level  and  will  by  no  means  be  fully  justified „   Also, 
details  such  as  overflow  and  handling  of  the  exponent  will  not  be  dis- 
cussed.  The  scheme,  however,  has  been  studied  and,  in  fact,  simulated 
by  the  author..   It  is  similar  to  that  proposed  for  implementation  in 
the  Illinois  Pattern  Recognition  Computer  (llliac  III).   The  number 
format  to  be  handled  by  this  device  is  assumed  to  be  an  8  byte  (8  bits 
per  byte)  normalized  floating  point  number  with  1  byte  of  exponent  and 
7  bytes  of  mantissa. 

Figure  9  is  a  simplified  block  diagram  of  the  proposed  unit. 
3.2.1  Notation 

The  conventions  used  in  Figure  9  are  as  follows: 

1.  Flipflop  registers  are  denoted  by  rectangles  with  the 
horizontal  subdivisions  indicating  bytes.   For  example, 
the  M  register  (M  REG)  is  7  bytes  (56  bits)  long. 

2.  Groups  of  combinatorial  logic  are  shown  in  circles  or 
rectangles  with  rounded  corners.  Any  gating  is  re- 
presented in  terms  of  AND  (•),    OR.(v),  and  EXCLUSIVE  0R($). 
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3.   The  widest  lines  indicate  a  bus  for  data  in  SD  format 
(2  "bits  per  digit,  see  Section  3 '2. 2),  the  next 
widest  for  numbers  in  conventional  notation  (l  "bit  per 
digit). 
h.      Gating  signal  names  are  of  the  form  F  F„  X  T  T  where: 
a*   F  and  F   (Fp  is  optional)  are  the  names  of  the 
registers  from  which  data  is  transferred. 

b.  X  =  D  if  the  transfer  is  direct;  i.e.  not  shifted. 
X  =  Rn  if  the  data  is  shifted  n  places  to  the 
right  during  the  transfer. 

X  =  Ln  if  the  data  is  shifted  n  places  to  the 
left  during  the  transfer. 

c.  T  and  1      (T  is  optional)  are  the  names  of  the 
registers  to  which  data  is  transferred  from  F 
and  Fp  respectively. 

d.  The  concatenation  of  register  names  starting 
with  the  same  letter  such  as  UM  and  US  is  further 
abbreviated  as  UMS. 

5-   Examples  of  gating  signal  names: 

a.  VDM  -  Gate  the  data  on  the  V-Bus  directly  into 
the  M-Register. 

b.  ML7Y1  -  Gate  the  contents  of  the  M-Register 
shifted  left  seven  positions  into  the  Y  input 
of  signed-digit  subtractor  SI. 

c.  UHQDLHQ,  is  equivalent  to  the  two  names  UHDLH 
and  UQDLQ. 

^8 


6,   The  label  TC  MD  or  FROM  MD  indicates  connections  to  the 
Model  Division  to  be  described  in  Section  3«3»3= 
3 . 2 . 2   Description  and  Operation 

As  mentioned  earlier,  multiplication  is  substantially  accel- 
erated by  the  use  of  an  adder  or  adders  which  eliminates  carry  propa- 
gation until  a  terminal  step.   The  "adder"  proposed  for  this  model, 
Sl-SU  is  actually  a  signed-digit  subtractor  (SDS):   it  incorporates 
facilities  for  postponing  borrow  propagation .  Actually,  the  device 
performs  both  addition  and  subtraction  under  control  of  the  "KEG" 
signal.  We  shall  digress  a  moment  for  a  brief  description  of  this 
device . 

Each  stage  of  the  signed-digit  subtractor  (SDS),  as  shown  in 

Figure  10,  is  a  3-input,  2 -output  device  together  with  an  interstage 

connection  and  a  "NEG"  control  line.   Y  is  a  bit  of  the  subtrahend 

i 

(minuend  -  subtrahend  =  remainder)  in  conventional  binary  form.   S. 

and  X.  together  comprise  the  minuend  in  a  redundant  notation  which  will 

be  called  SD  format.   Each  digit  of  the  minuend  is  of  the  form  S.  X. 
— ■ to  ii 

where  X,  is  interpreted  as  a  magnitude,  1  or  0  and  S  as  a  sign, 

0  -  +  1  =  -.   The  SD  format  digits  are  therefore  represented  as  follows: 
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Figure  10,   Stage  of  a  Signed-Digit  Subtractor 
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The  output  of  the  subtractor  is  in  this  same  forma t,  i.e.  Z. 

is  the  magnitude  of  the  digit,  T.  is  the  sign.   C.  and  C.  ,  are 

t>  i  i      l-l 

interstage  connections  and,  as  may  be  seen  from  the  logic  equations 
are  not  propagating  borrows.   Another  advantage  of  SD  format  is  that 
a  number  may  be  negated  merely  by  complementing  the  sign  (S)  bits. 

Note  that  the  postponing  of  borrow  propagation  is  achieved 
only  at  the  expense  of  introducing  redundancy  into  the  representation 
of  the  result.   Actually  two  registers,  for  example  US  and  UM,  are 
required  to  store  a  number  in  this  redundant  form. 

We  must  also  pay  the  price  of  conversion  or  assimilation,  to 
conventional  form.   This  assimilation  actually  requires  a  borrow  pro- 
pagation and  one  additional  subtraction.   The  propagation  is  accelerated 
by  use  of  look-ahead  techniques,  but  is  still  rather  time-consuming 
and  expensive.   The  propagation  occurs  in  the  propagation  logic  the 
output  of  which  is  then  applied  to  the  Y  input  of  Qh   to  produce  the 
assimilated  result. 

The  propagation  logic  forms  the  outputs 

B.  .  =  3.  Z.  v  T.  Z. 

l-l    11    ii 

and  o4  is  used  to  produce  the  assimilated  result  with  bits 

A.  =  Z.  9   B. 

ill 

roi 

The  SDS  is  described  in  more  detail  in  reference     . 

In  the  proposed  scheme,  four  of  the  signed-digit  subtractors 
are  cascaded  to  provide  multiplication,  radix  256,  i.e.  8  bits  of  the 
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multiplier  are  used  simultaneously.   The  multiplicand  is  loaded  from 
the  V-BUS  into  M,  the  multiplier  into  UQ .   The  low  order  byte  of  UQ 
drives  recoding  logic  which  couples  to  the  control  lines  in  the  shift 
array. 

This  recoding,  suggested  by  Wallace     ,  requires  plus  and 
minus  multiples  of  128,  6k,    32,  l6,  8,  k,    2,  and  1  times  the  multiplicand, 
The  multiples  are  formed  by  the  shift  array;  the  signs  by  the  KEG  con- 
trols, i.e.  by  adding  or  subtracting  the  multiple.   The  MDY1  input  is 
used  only  for  an  ADD  or  SUBTRACT  instruction,  not  for  MULTIPLY. 

After  passing  through  the  SDS  cascade,  the  contents  of 
LS-LM  and  LH-LQ  -(partial  product  and  multiplier)  are  shifted  right  8 
bits  back  into  the  US-UM  and  UQ  Registers.   This  continues  for  8 
cycles;  the  9th  is  an  assimilation  cycle.   Here  the  product  in  SD 
format  is  applied  to  the  propagation  logic,  the  output  of  the  propa- 
gation logic  to  S^+,  and  consequently  converted  to  conventional 
representation. 

Admittedly  the  scheme  just  outlined  is  expensive  and  in  many 
cases  may  not  be  justified.  The  designer  may  wish  to  choose  a  similar 
scheme  but  with  fewer  levels  of  cascade,  i.e.  smaller  radix.  Although 
the  division  scheme  to  be  designed  is  built  upon  this  radix  256  multi- 
plication scheme,  the  techniques  and  procedures  should  be  easily 
reducible  to  a  lower  radix  case. 

Before  concluding  this  section,  we  must  admit  a  slight 
diversion  from  our  design  strategy.   The  reader  may  have  noticed  that 
all  four  of  the  SDS  in  Figure  9  have  been  extended  to  the  left  one  byte. 
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Actually,  if  the  multiples  of  M  were  added  in  the  order,  1,  2,  h,    8, 
l6,  32,  Gh,    128  rather  than  the  way  shown,  only  S^4  would  have  to  he 
extended  a  full  8  bits.   Since,  however,  quotient  digits  are  formed 
most  significant  first,  (the  product  is  formed  least  significant  first) 
and  we  wish  to  use  this  same  shift  array  for  divide,  the  arrangement 
must  be  as  shown.   The  extra  SDS  stages  must  be  included  and  thus  the 
division  scheme  has,  to  some  extent,  infringed  upon  the  design  of  the 
multiplication  scheme. 

3.3  Design  of  Division  Scheme 
3°3°1  General 

Now  begins  the  task  of  embedding  a  division  scheme  within 
the  multiplication  scheme  described  in  the  last  section.   Since  the 
SDS  cascade  will  perform  both  addition  and  subtraction  of  the  contents 
of  the  M-Register  and  the  number  in  SD  format  in  the  UM-US  Registers, 
the  obvious  extension  is  to  place  the  divisor  in  M  and  the  dividend 
and  subsequent  partial  remainders  in  UM-US.   The  quotient  digits  will 
be  produced  in  redundant  form.   In  this  case  a  logical  choice  would  be 
to  produce  quotient  digits  in  SD  format  so  that  they  may  be  assimilated 
by  the  same  circuits  as  used  in  multiplication.   The  contents  of  UH-UQ 
may  be  gated  to  US-UM  via  UHQDUSM  and  then  assimilated  as  in  the  final 
cycle  of  multiplication.   The  quotient  is  thus  stored  in  UH-UQ:   the 
sign  bits  in  UH  and  magnitude  bits  in  UQ.   Furthermore,  division  with 
the  hardware  will  require  an  8  bit  shift  from  LS-LM  to  US-UM 
(LSML8USM)  and  from  LH-LQ  to  UH-UQ  (LHQL8UHQ) . 


53 


The  full  precision  division  is  now  generally  defined.   The 
divisor  is  first  stored  in  M,  the  dividend  in  UM  and  the  sign  of  the 
dividend  in  all  positions  of  US.   Quotient  digits  are  then  formed  by 
a  model  division  using  d  and  rp  .   The  quotient  digits  are  stored  in 
SD  format  in  UH-UQ  and  also  used  to  set  the  multiples  of  the  divisor 
in  M  to  be  subtracted  from  the  dividend.   The  next  partial  remainder  is 
formed  in  the  SDS  cascade  (SI,  S2,  S3,  SU),  stored  in  LS-LM,  and  then 
shifted  left  8  bits  into  US-IM.   These  cycles  continue  until  the  full 
precision  quotient  has.  been  generated.   The  quotient  is  then  gated 
directly  from  UH-UQ  into  US-UM,  assimilated,  and  gated  into  EM  where  it 
is  available  to  the  central  processing  unit. 

We  must  now  design  a  model  division  to  select  the  quotient 
digits  to  be  stored  in  UH-UQ  and  to  be  used  to  control  the  M- shift 

array  in  forming  a  full  precision  partial  remainder.   Note  that  the 

2k 
division  scheme  here  is  of  the  class  with  radix  r  =  2   ,  n  =  2/3  (r-l) 

as  mentioned  in  Section  2.5-2.   The  number  of  cascades, k, is  k   in  this 

case.   The  value  of  n  is  the  sum  of  the  maximum  multiples  of  the  divisor 

which  may  be  formed  at  each  stage  of  the  SDS  cascade  and  here  is 

128  +  32  +  8  +  2  =  170.   The  radix  point  is  between  the  leftmost  and 

next  leftmost  byte  of  the  UM-US  and  LM-LS  Registers. 

3.3.2  An  Arithmetic  Model 

First  considering  an  arithmetic  model,  we  select  case  3  of 

Section  2.5.2  and  calculate  that  for  k  =  k.    N  =  12  bits  and  N,  =  13 

p  d 

bits.   The  first  12  bits  of  the  shifted  partial  remainder  could  there- 
fore be  assimilated'  into  conventional  form  and  divided  by  the  13  high 
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order  bits  of  the  divisor  to  produce  8  quotient  bits.   This  operation 
could  be  performed  by  a  non-restoring  scheme  in  auxilary  hardware  such 
as  the  exponent  arithmetic  unite   Since  an  exponent  unit  normally 
does  not  perform  division,  some  augmentation  is  required.   The  minimum 
addition  would  be  a  left  shift  path  from  the  secondary  to  the  primary 
rank  of  the  accumulator.   Also,  since  we  have  specified  only  a  7  bit 
exponent,  the  width  of  the  exponent  unit  would  require  an  extension 
of  5  bits.   These  additions  would,  however,  be  relatively  inexpensive. 
The  exponent  unit,  which  normally  sits  idle  during  most  of  the  division 
operation,  could  be  used  more  efficiently. 

There  is  however,  a  major  disadvantage  to  the  arithmetic 
models:   the  necessity  to  round  the  quotient  digits  produced  by  the 
model  before  being  used  by  the  full  precision  mechanism.   This 
rounding  was  mentioned  in  Section  2.5-2  and  is  obligatory  if  the  cost 
criterion  is  to  hold.   Without  this  requirement  the  quotient  bits 
could  be  used  sequentially  as  they  are  generated  to  set  the  gates  of 
the  M-Shift  array.   In  this  case,  the  full  precision  divisor  would  be 
formed  in  LS-LM  very  shortly  after  the  last  quotient  bit  was  produced 
by  the  model.  Since,  however,  the  rounding  may  affect  the  most  signi- 
ficant bit  of  the  quotient  returned  from  the  model,  the  propagation 
through  the  SDS  array  cannot  begin  until  the  model  division  is  complete. 
This  restriction  severely  limits  the  feasibility  of  the  arithmetic 
models  and  due  to  this  rounding  requirement,  a  table  look-up  model 
will  be  used  in  the  example  developed  here. 
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3.3-3  A  Table  Look -Up  Model 

As  described  in  Section  2.6.3;  the  round-off  problem  does  not 
arise  in  a  table  look-up  model.  The  major  disadvantage  here  is  hard- 
ware  cost  and  large  fanout  requirements  of  d  and  rp .  to  the  selection 

u 

logic.   In  the  example  arithmetic  unit  being  developed  here,  multipli- 
cation is  radix  256.   For  compatibility  we  would  also  like  division  to 
be  radix  256,  and  consequently,  would  like  a  radix  256  table  look-up 
model  which  would  produce  8  bits  of  the  quotient  in  parallel.   By 
considering  a  P-D  plot  for  radix  256,  n  =  I7O,  or  merely  the  fact 

that  N  =  12  bits  and  N,  =  h   bits,  the  reader  may  quickly  convince 
p  d 

himself  that  the  hardware  requirements  for  such  a  scheme  are  prohibi- 
tive, at  least  with  conventional  logic. 

A  radix  16 -table  look-up  is  probably  possible  with  integrated 
circuitry  and  perhaps  with  more  conventional  circuitry  if  the  designer 
is  willing  to  pay  the  price;   approximately  25O,  5-input  NAWDS;  160, 
8-input  NANDS;  250,  8-input  N#RS;  and  160  drivers  which  will  drive  up 
to  50  NOR  loads. 

In  this  example  we  will  adopt  a  more  modest  approach  in 
implementing  a  radix  U-table  look-up  and  apply  it  successively  at  four 
positions  of  the  SDS  cascade.   In  a  sense,  we  have  been  forced  to 
reduce  the  radix  256  division  to  ^4-radix  k   divisions. 

From  Section  2.5.3  a  radix  k   table  look-up  model  requires 

N  =  k,    N  =6.   The  6  bits  of  the  partial  remainder  are  supplied 
d       p 

sequentially  from  four  stages  of  the  full  precision  hardware  labelled 
"TO  MD"  in  Figure  9.   The  first  stage  is  the  output  of  US-UM,  the  other 
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three  from  the  output  of  SI,  S2,  and  S3-   The  high  order  bit  supplied 

to  the  model  is  displaced  2  bits  right  at  each  stage „   Thus  if  the 

/\ 

subscript  1  denotes  the  high  order  digital  position,  the  first  rp  . 

J 

to  the  model  is  US  ,  UM  through  US,-,  UM^  •   The  second  input  is  the 
third  through  eighth  output  of  SI,  etc. 

A  block  diagram  of  the  proposed  table  look-up  model  is  shown 
in  Figure  11  and  described  in  Table  k*  The  P-D  plot  which  is  actually 
implemented  is  shown  in  Figure  12c  Table  5  explicitly  illustrates  the 
quotient  digit  selection  for  each  rp .  and  d.   Note  the  correspondence 

J 

between  the  steps  in  the  overlap  regions  of  Figure  11  and  the  steps 
shown  in  the  table . 

Before  studying  these  figures  and  tables  note  the  following 
considerations  which  are  incorporated  in  the  design: 

1.  Only  the  first  quadrant  of  the  P-D  plot  is  actually 

implemented.   The  approximations  d  and  rp  .  are  considered 

to  be  positive  and  the  real  sign  is  computed  as  with  a 

sign-magnitude  representation-   If  rp  .  is  negative  when 

J 

presented  to  the  model,  it  is  made  positive  before 
assimilation  by  complementing  the  sign  bits. 

2.  The  divisor  and  thus  the  selected  divisor  interval  is  a 
constant  for  a  given  division  and  thus  the  speed  of 
selecting  the  divisor  interval  is  much  less  critical 
than  that  of  forming  the  partial  remainder  interval. 
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3.   The  QUOTIENT  SELECT  TABLE  actually  implements  ZERO  and 
TWO  regions  of  the  P-D  plot  in  Fig-are  12  and  forms  j6NE 


as  ZERO  TWO.   The  TWO  and  ZERO  regions  are  easier  to 

implement  than  the  j&NE  region  since  they  are  bounded 

on  one  side  by  the  range  restrictions  on  rp  . . 

The  inputs  to  the  model  and  the  controls  are  supplied  from 

the  full  precision  unit  as  shown  in  Figure  9  and  are  designated  as 

follows: 

i,j    =  integer  subscripts ° 

US.    =  the  true  output  of  the  j-th  position  of  the  US 

Register  containing  the  sign  bits  of  the  partial 

remainder o 

UM.    =  the  true  output  of  the  j-th  position  of  the  UM 
J 

Register  containing  the  magnitude  bits  of  the 

partial  remainder. 
T.  .   =  the  j-th  sign  bit  of  the  output  of  isigned- 

digit  subtractor  Si. 
Z.  .    =  the  j-th  magnitude  bit  of  the  output  of  signed 

digit  sutractor  Si. 

M.     =  the  true  output  of  the  j-th  position  of  the  M 
J 

Register  containing  the  divisor.  M  is  the  sign 
of  the  divisor- 
C.     =  sequence  control  signals. 
E      =  logical  simmation  (j6R.). 
H     =  logical  product  (AND) 
The  other  symbols  used  in  Figure  11  are  defined  in  Table  k. 
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FIGURE     12.      P-D     PLOT     FOR   TABLE    LOOK-UP    MODEL 
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hp. 


QUOTIENT  DIGIT  SELECTED 


10.1100 


01.1100 
01.1011 

01.1010 
01.1001 
01.1000 
01 . 0111 
01.0110 
01 . 0101 
01 . 0100 
01 . 0011 
01.0010 
01.0001 
01 . 0000 
00.1111 
00.1110 
00.1101 
00.1100 
00 . 1011 
00.1010 
00.1001 
00.1000 
00.0111 
00.0110 
00.0101 
00.0100 
00 . 0011 
00.0010 
00.0001 
00 . 0000 


I J 

0 
0 
0 
0 


1 
1 
1 
1 
1 
1 
1 
1 
1 


2 
2 
2 
2 
2 
2 

1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 


2 
2 
2 
2 
2 

_2_ 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 


Divisor  d 


.1000 


.1001 


1010 


.1011 


1100 


.1101 


,1110 


1111 


Table  5.   Quotient  Select  Table. 
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3.U  Estimate  of  Speed  of  Execution 

Although  in  this  report  we  have  described  the  division 
scheme  only  at  the  block  diagram  level,  a  detailed  simulation  has  been 
programmed  and  will  be  available  in     .   Based  upon  this  simulation 
and  actual  logic  design  of  the  arithmetic  unit  of  Illiac  III  we  can 
estimate  the  execution  time  of  this  division  scheme  in  terms  of 
transistor  collector  delays „   The  actual  logic  is  of  the  direct  coupled 
saturated  DTL  type . 

Table  6  summarizes  the  number  of  transistor  collector  delays 
associated  with  operation  of  each  block  of  the  model  division,  Figure 
11,  and  with  the  relevant  blocks  of  the  complete  arithmetic  unit  shown 
in  Figure  9°   These  figures  are  used  in  Table  7 ■ in  tracing  the  opera- 
tions involved  in  performing  one  division  cycle  i.e.  making  one  pass 
through  the  SDS  cascade  and  producing  8  quotient  digits  in  SD  format- 
The  final  cycle  assimilates  the  redundantly  represented  quotient  as 
described  under  ASSIMILATION. 

To  estimate  the  execution  time  in  seconds  we  shall  assume  a 
collector  delay  of  15  ns  and  thus  8  bits  of  quotient  require  76  x  15  ns 
1.1  usee.   A  56  bit  division  such  as  proposed  for  Illiac  III  therefore 
requires  7  •!  J^sec.  plus  0.3j^sec.  for  assimilation  or  a  total  of  8  jusec, 
Initial  and  terminal  shifting  of  operands  have  not  been  included  but 
represent  a  negligible  time  compared  to  the  execution  time  of  the 
recursive  operations. 
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BLOCK 


NUMBER  OF 
COLLECTOR  DELAYS 


Model  Division  Figure  11 

Input  Gating 

Sign  Detect 

Negate 

Borrow  Generate 

Quotient  Select  Table 

Quotient  Storage  and  Shift  Control 


2 
1 
1 
3 
2 

3 


Total  for  Model  per  2  Digits  of  Quotient    12 


Full  Precision  Division  Figure  9 

Signed-Digit  Subtracter  (Each) 
(SI,  S2,  S3,  Sk) 

M-Shift  Gates  (including  Driver) 
Register  to  Register  Transfer 
Propagation  Logic 


3 
3 
2 

7 


Table  6.   Transistor  Collector  Delays  of  Blocks  of  the  Division  Scheme 
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Initial  Conditions: 


Divisor  in  M-Register-   Dividend  in  UM-Register „ 
Sign  of  Dividend  in  All  Positions  of  US-Register. 


EVENT 

QUOTIENT  GENERATION 


NUMBER  OF  COLLECTOR  DELAYS 


Perform  Model  Division 
Set  ML7Y1  or  ML6Y1 
Perform  Add/ Subtract  in  SI 
Perform  Model  Division 
Set  MLSY2  or  MLUY2 
Perform  Add/Subtract  in  S2 
Perform  Model  Division 
Set  ML3Y3  or  ML2Y3 
Perform  Add/Subract  in  S3 
Perform  Model  Division 
Set  ML1YU  or  MDY^ 
Perform  Add/ Subtract  in  Sk 
Store  Result  in  LS-LM 
Left  Shift  via  LSML8USM 

Total  Time  per  8  Bits  of  Quotient 


12 
3 
3 

12 

3 

3 

12 

3 

3 

12 

3 
3 
2 
2 
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ASSIMILATION 

Gate  Quotient  in  UH-UQ  to  US-UM  via  UHQDUSM       2 

Direct  through  SI  h 

Generate  Borrows  in  Propagation  7 

Assimilate  to  Conventional  Form  in  Sk                             3 

Store  in  LM  _2_ 

Total  Time  for  Assimilation  18 


Table  7*   Transistor  Collector  Delays  in  Execution  of  Division. 
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k.      SUMMARY  AND  CONCLUSION 

k.l     Summary 

The  first  half  of  this  report  was  largely  a  constructive 
definition  of  SRT  division.   It  introduced  a  recursive  relationship 
defining  division,  a  representation  of  the  quotient  allowing  both 
positive  and  negative  digits,  and  range  restrictions  on  the  partial 
remainders.   It  was  then  shown  that  the  consequence  of  this  quotient 
representation  and  range  restriction  was  that  correct  quotient  digits 
could  be  selected  by  inspection  of  truncated  versions  of  the  divisor 
and  shifted  partial  remainders.   The  P-D  plot  was  described  and  used 
as  a  key  tool  in  the  development. 

Next,  the  report  turned  to  the  more  specific  task  of  deter- 
mining the  number  of  bits  necessary  in  these  approximations.   The  cost 
criterion  was  stated  as  the  fundamental  requirement  on  the  precision  of 

inspection.   Although  this  criterion  is  general,  to  obtain  numerical 

2k 
results  the  discussion  was  restricted  to  a  radix  of  the  form  r  =  2 

and  to  the  arithmetic  or  table  look-up  type.   The  chapter  concluded 
with  a  short  discussion  of  the  conversion  of  the  redundantly  represented 
numbers  to  conventional  form. 

The  second  major  section  of  the  report  attempted  to  relate 
the  equations,  graphs,  and  statements  of  the  first  section  to  real- 
world  problems  of  designing  a  digital  arithmetic  unit.   It  described 
some  general  design  considerations  and  pointed  to  compatibility  of 
division  with  multiplication  as  one  of  the  most  important. 
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At  this  point,  the  discussion  of  division  digressed  to  one  of 
proposing  a  multiplication  scheme  and  to  the  block  structure  of  an 
arithmetic  unit  with  which  it  could  be  realized.   The  focus  then 
returned  to  division  where,  after  rejecting  an  arithmetic  model,  a 
table  look-up  model  division  was  proposed. 

The  model  was  described  at  the  black-box  level  and  some 
estimate  was  given  as  to  the  expected  operation  time  of  such  a  scheme 
implemented  with  conventional  DTL. 


h.2.     Conclusi 


on 


To  a  large  extent,  this  report  has  been  directed  to  the 
designer  faced  with  the  task  of  implementing  digital  division.   The 
mode  of  presentation,  however,  has  not  been  intended  to  be  of  an 
algorithmic  style,  but  is  rather  aimed  at  a  basic  understanding  of 
SRT  division  in  hopes  that  the  designer  will  be  able  to  adapt  it  to 
his  particular  specifications  and  hardware.   The  chapter  on  imple- 
mentation was  included  merely  to  indicate  one  way  of  applying  SRT 
division. 

The  author  also  hopes  that  this  report  will  support  ex- 
ploration into  development  of  higher  radix  quotient  selection  models, 
e.g.  a  true  radix  256  model  which  can  select  8  quotients  bits  in 
parallel.   Note  that  the  operating  speed  of  the  model  in  the  example 
implementation  is  by  far  the  slowest  link. 


70 


Much  of  the  delay  in  quotient  select  is,  however,  charge- 
able to  the  necessity  for  assimilating  the  redundantly  represented 
p  . .   It  would  therefore  appear  appropriate  to  explore  models  which 
could  select  quotients  directly  from  a  redundantly  represented  partial 
remainder.,   Perhaps  this  could  he  accomplished  with  analog  techniques 
in  which  rp .  was  converted  to  a  voltage  proportional  to  the  weighted 

J 

sum  of  the  bits.   Such  a  converter  could  handle  both  plus  and  minus 
weights.   It  may  also  be  possible  to  mitigate  the  round-off  problem 
associated  with  the  arithmetic  models.   The  P-D  plot  could  then  be 
implemented  with  analog-digital  rather  than  strictly  digital  circuits. 

Also  note  that  the  form  of  the  quotient  selected  by  the  model 
in  the  example  implementation  is  by  no  means  unique.   In  this  case,  the 
SD  format  was  selected  so  as  to  be  compatible  with  the  M-Shift  Array 
control  signals  and  the  assimilation  circuitry  used  for  multiplica- 
tion.  There  may,  however,  be  more  efficient  recodings.   Perhaps  the 
goals  could  best  be  summarized  as  attempting  to  implement  division  so 
that  it  is  actually  performed  as  the  inverse  of  multiplication. 
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